feat(rlsapi): implement LLM integration for v1 /infer endpoint #916

major · 2025-12-15T18:23:54Z

Description

Wire up the /infer endpoint to Llama Stack for actual inference of RHEL Lightspeed requests:

Handle empty LLM responses with fallback message
Include request_id in all log statements for tracing

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: Claude

Related Tickets & Documents

Related Issue RSPEED-2229

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

No changes to how tests are run.

Summary by CodeRabbit

New Features
- Replaced placeholder inference with live LLM retrieval for user queries.
- Added request/response logging and session-aware retrieval to improve reliability.
Bug Fixes
- Returns HTTP 503 with a service-unavailable payload when the LLM backend is unreachable.
- Provides a fallback response when the LLM yields no content.
- Strips whitespace and validates blank questions before processing.
Tests
- Expanded unit tests covering default model resolution, successful and empty LLM outputs, API errors, and validation.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-15T18:24:03Z

Walkthrough

Added default-model resolution and a stateless retrieval flow for the /infer endpoint: new helpers fetch configured provider/model, create a temporary Llama Stack agent, send a UserMessage, extract the response (with safety/fallbacks), and map API connection/configuration failures to HTTP 503.

Changes

Cohort / File(s)	Summary
Endpoint implementation `src/app/endpoints/rlsapi_v1.py`	Added `_get_default_model_id()` to read provider/model from configuration and raise 503 on misconfiguration; added `retrieve_simple_response(question: str)` to build a temporary agent, send a UserMessage, and extract response text; wired APIConnectionError handling and logging; updated `infer_responses` to include 503 and integrated retrieval flow with fallback to `constants.UNABLE_TO_PROCESS_RESPONSE`.
Unit tests `tests/unit/app/endpoints/test_rlsapi_v1.py`	Expanded tests with fixtures mocking configuration, LLM success/empty responses, and APIConnectionError; added tests for default-model resolution (success/failure), `retrieve_simple_response` (success/empty/error), `/infer` endpoint behaviors (minimal/full requests, unique request_id, whitespace stripping, input validation), and 503 handling.

Sequence Diagram

sequenceDiagram
    actor Client
    participant Endpoint as /infer Endpoint
    participant Config as Configuration
    participant Agent as Temp Agent (Llama Stack)
    participant LLM as LLM Service

    Client->>Endpoint: POST /infer (question)
    Endpoint->>Endpoint: Validate & strip question
    Endpoint->>Config: Resolve default model/provider
    alt Config failure
        Config-->>Endpoint: Error
        Endpoint-->>Client: HTTP 503 (ServiceUnavailable)
    else Config success
        Config-->>Endpoint: provider:model
        Endpoint->>Agent: Create temporary agent
        Endpoint->>Agent: Send UserMessage(question)
        Agent->>LLM: Forward user message
        LLM-->>Agent: Return Turn/response
        Agent-->>Endpoint: Response content
        alt Content present
            Endpoint-->>Client: HTTP 200 + LLM response
        else Empty content
            Endpoint-->>Client: HTTP 200 + UNABLE_TO_PROCESS_RESPONSE
        end
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Check correct mapping of configuration errors and APIConnectionError to HTTP 503
Verify temporary agent creation/teardown and async handling in retrieve_simple_response
Review response extraction and fallback to constants.UNABLE_TO_PROCESS_RESPONSE
Inspect new unit tests for adequate coverage and realistic mocks

Possibly related PRs

feat(rlsapi): add v1 /infer endpoint module #902 — Directly modifies the same rlsapi_v1 endpoint and appears to be the upstream change this PR extends or replaces.

Suggested labels

ok-to-test

Suggested reviewers

tisnik

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat(rlsapi): implement LLM integration for v1 /infer endpoint' accurately and specifically describes the main change: wiring the /infer endpoint to Llama Stack for actual LLM inference.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2025-12-15T18:24:04Z

Hi @major. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Wire up the /infer endpoint to Llama Stack for actual inference: - Handle empty LLM responses with fallback message - Include request_id in all log statements for tracing Signed-off-by: Major Hayden <major@redhat.com>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

tests/unit/app/endpoints/test_rlsapi_v1.py (1)
46-75: Consider using AsyncMock for async method mocks.

mock_agent.create_turn is awaited in the implementation, but the mock uses a regular Mock. While Python 3.8+ handles this, using AsyncMock is more explicit and aligns with the coding guidelines.
 def mock_llm_response_fixture(
     mocker: MockerFixture, prepare_agent_mocks: AgentFixtures
 ) -> None:
     """Mock the LLM integration for successful responses."""
     mock_client, mock_agent = prepare_agent_mocks
+    
+    # Use AsyncMock for async methods
+    mock_agent.create_turn = mocker.AsyncMock()

     # Create mock output message with content
     mock_output_message = mocker.Mock()
     mock_output_message.content = "This is a test LLM response."

     # Create mock turn response
     mock_turn = mocker.Mock()
     mock_turn.output_message = mock_output_message
     mock_turn.steps = []
     mock_agent.create_turn.return_value = mock_turn
As per coding guidelines, async methods should use AsyncMock for mocking.
src/app/endpoints/rlsapi_v1.py (1)
82-119: Consider documenting HTTPException in the Raises section.

The function can also raise HTTPException (503) via _get_default_model_id() if configuration is missing. Adding this to the docstring would provide complete documentation of possible exceptions.
     Raises:
         APIConnectionError: If the Llama Stack service is unreachable.
+        HTTPException: 503 if no model is configured.
     """

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2a2a30f and 564434e.

📒 Files selected for processing (2)

src/app/endpoints/rlsapi_v1.py (4 hunks)
tests/unit/app/endpoints/test_rlsapi_v1.py (4 hunks)

🧰 Additional context used

📓 Path-based instructions (5)

tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests; do not use unittest framework
Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage

Files:

tests/unit/app/endpoints/test_rlsapi_v1.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use pytest-mock with AsyncMock objects for mocking in tests

Files:

tests/unit/app/endpoints/test_rlsapi_v1.py

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Use absolute imports for internal modules in LCS project (e.g., from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Use logger = logging.getLogger(__name__) pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) and Optional[Type] or Type | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Use async def for I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes: Configuration for config classes, Error/Exception for exceptions, Resolver for strategy patterns, Interface for abstract base classes
Abstract classes must use ABC with @abstractmethod decorators
Include complete type annotations for all class attributes in Python classes
Use import logging and module logger pattern with standard log levels: debug, info, warning, error

Files:

src/app/endpoints/rlsapi_v1.py

src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

src/app/endpoints/rlsapi_v1.py

src/**/{client,app/endpoints/**}.py

📄 CodeRabbit inference engine (CLAUDE.md)

Handle APIConnectionError from Llama Stack in integration code

Files:

src/app/endpoints/rlsapi_v1.py

🧬 Code graph analysis (2)

tests/unit/app/endpoints/test_rlsapi_v1.py (4)

src/app/endpoints/rlsapi_v1.py (3)

_get_default_model_id (49-79)

infer_endpoint (124-182)

retrieve_simple_response (82-119)

src/configuration.py (3)

configuration (73-77)

AppConfig (39-181)

inference (134-138)

src/utils/suid.py (1)

check_suid (19-54)

src/models/rlsapi/requests.py (1)

RlsapiV1InferRequest (123-200)

src/app/endpoints/rlsapi_v1.py (5)

src/client.py (2)

AsyncLlamaStackClientHolder (18-55)

get_client (49-55)

src/models/config.py (1)

Action (755-808)

src/models/rlsapi/requests.py (1)

RlsapiV1InferRequest (123-200)

src/utils/endpoints.py (1)

get_temp_agent (387-423)

src/utils/types.py (1)

content_to_str (20-40)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: build-pr

🔇 Additional comments (12)

tests/unit/app/endpoints/test_rlsapi_v1.py (8)

1-32: LGTM!

Imports are well-organized, and the pylint disables are appropriate for testing private functions and fixture-based side effects.

35-43: LGTM!

The configuration fixture correctly extends the minimal config and patches it at the point of use.

78-103: LGTM!

The fixture correctly simulates an empty LLM response for testing the fallback path. The same AsyncMock suggestion from the previous fixture applies here as well.

106-116: LGTM!

The fixture properly simulates APIConnectionError for testing error handling. While the real exception would originate from network operations rather than get_client(), this approach is valid for unit testing the exception handling flow.

119-163: LGTM!

Good test coverage for _get_default_model_id() with parameterized error cases. The tests properly verify both success and failure paths with appropriate assertions.

166-193: LGTM!

Comprehensive async tests for retrieve_simple_response() covering success, empty output, and error propagation scenarios.

196-279: LGTM!

Excellent test coverage for infer_endpoint() including minimal/full requests, unique request ID generation, 503 error handling, and fallback behavior for empty LLM responses.

282-295: LGTM!

Good validation tests ensuring the request model properly rejects empty/whitespace questions and strips whitespace from valid questions.

src/app/endpoints/rlsapi_v1.py (4)

1-35: LGTM!

Module setup follows coding guidelines with proper docstring, logger pattern, and organized imports.

38-46: LGTM!

The 503 response correctly documents the service unavailability scenario in the OpenAPI schema.

49-79: LGTM!

Well-structured helper function with proper error handling, descriptive docstring, and appropriate HTTP 503 responses for configuration issues.

122-182: LGTM!

Solid endpoint implementation with:

Proper async handling and type annotations

Unique request_id generation for tracing

Comprehensive error handling (503 for APIConnectionError)

Fallback for empty LLM responses

Good logging with request_id context throughout

Signed-off-by: Major Hayden <major@redhat.com>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

src/app/endpoints/rlsapi_v1.py (1)

49-80: Use ServiceUnavailableResponse for consistency.

The function manually constructs HTTPException with a custom detail dict, while infer_endpoint uses ServiceUnavailableResponse for similar 503 errors. This inconsistency makes error handling harder to maintain and may result in different response formats.

Apply this diff to use ServiceUnavailableResponse consistently:

 def _get_default_model_id() -> str:
     """Get the default model ID from configuration.
 
     Returns the model identifier in Llama Stack format (provider/model).
 
     Returns:
         The model identifier string.
 
     Raises:
         HTTPException: If no model can be determined from configuration.
     """
     if configuration.inference is None:
         msg = "No inference configuration available"
         logger.error(msg)
-        raise HTTPException(
-            status_code=503,
-            detail={"response": "Service configuration error", "cause": msg},
-        )
+        response = ServiceUnavailableResponse(
+            backend_name="Configuration",
+            cause=msg,
+        )
+        raise HTTPException(**response.model_dump())
 
     model_id = configuration.inference.default_model
     provider_id = configuration.inference.default_provider
 
     if model_id and provider_id:
         return f"{provider_id}/{model_id}"
 
     msg = "No default model configured for rlsapi v1 inference"
     logger.error(msg)
-    raise HTTPException(
-        status_code=503,
-        detail={"response": "Service configuration error", "cause": msg},
-    )
+    response = ServiceUnavailableResponse(
+        backend_name="Configuration",
+        cause=msg,
+    )
+    raise HTTPException(**response.model_dump())

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 564434e and f0d9dbc.

📒 Files selected for processing (2)

src/app/endpoints/rlsapi_v1.py (4 hunks)
tests/unit/app/endpoints/test_rlsapi_v1.py (4 hunks)

🧰 Additional context used

📓 Path-based instructions (5)

tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests; do not use unittest framework
Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage

Files:

tests/unit/app/endpoints/test_rlsapi_v1.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use pytest-mock with AsyncMock objects for mocking in tests

Files:

tests/unit/app/endpoints/test_rlsapi_v1.py

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Use absolute imports for internal modules in LCS project (e.g., from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Use logger = logging.getLogger(__name__) pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) and Optional[Type] or Type | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Use async def for I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes: Configuration for config classes, Error/Exception for exceptions, Resolver for strategy patterns, Interface for abstract base classes
Abstract classes must use ABC with @abstractmethod decorators
Include complete type annotations for all class attributes in Python classes
Use import logging and module logger pattern with standard log levels: debug, info, warning, error

Files:

src/app/endpoints/rlsapi_v1.py

src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

src/app/endpoints/rlsapi_v1.py

src/**/{client,app/endpoints/**}.py

📄 CodeRabbit inference engine (CLAUDE.md)

Handle APIConnectionError from Llama Stack in integration code

Files:

src/app/endpoints/rlsapi_v1.py

🧬 Code graph analysis (1)

tests/unit/app/endpoints/test_rlsapi_v1.py (1)

src/app/endpoints/rlsapi_v1.py (3)

_get_default_model_id (49-79)

infer_endpoint (125-183)

retrieve_simple_response (82-120)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: build-pr

🔇 Additional comments (8)

tests/unit/app/endpoints/test_rlsapi_v1.py (5)
3-4: LGTM: Appropriate pylint disables for unit tests.

The disables for protected-access and unused-argument are standard for unit tests that need to test private functions and use fixtures.

35-43: LGTM: Well-structured configuration fixture.

The fixture properly extends the minimal config with inference defaults and patches it into the module under test.

126-198: LGTM: Comprehensive test coverage for helper functions.

The tests cover success paths, error conditions, and edge cases for both _get_default_model_id and retrieve_simple_response. Good use of parametrization and proper async test markers.

203-299: LGTM: Thorough endpoint and validation testing.

The test suite comprehensively covers the endpoint behavior including:

Minimal and full-context requests

Unique request ID generation

API connection error handling (503 response)

Empty LLM response fallback

Input validation and whitespace stripping

110-121: Add # type: ignore comment to APIConnectionError instantiation for type checking consistency.

This fixture instantiates APIConnectionError(request=mocker.Mock()) without the # type: ignore comment that all other similar usages in the codebase include. Add the comment on line 114-115 to align with the pattern used throughout the test suite (e.g., test_info.py, test_models.py, test_rags.py).
⛔ Skipped due to learnings
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/**/{client,app/endpoints/**}.py : Handle `APIConnectionError` from Llama Stack in integration code
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to tests/**/*.py : Use `pytest-mock` with AsyncMock objects for mocking in tests
src/app/endpoints/rlsapi_v1.py (3)

8-46: LGTM: Imports and response documentation updated appropriately.

The new imports support LLM integration and the infer_responses mapping now includes the 503 ServiceUnavailableResponse for API unavailability scenarios.

82-121: LGTM: Well-structured stateless retrieval function.

The function properly handles LLM communication with defensive checks for missing attributes. The use of getattr prevents AttributeError when the response structure is incomplete.

123-183: LGTM: Robust endpoint implementation with proper error handling.

The endpoint correctly:

Generates unique request IDs for tracing

Handles APIConnectionError by returning HTTP 503 with ServiceUnavailableResponse

Provides fallback text for empty LLM responses

Includes comprehensive logging at all stages

As per coding guidelines, the endpoint properly handles APIConnectionError from Llama Stack.

tisnik

Seems legit, thank you

tisnik · 2025-12-15T20:15:34Z

/ok-to-test

openshift-ci bot added the needs-ok-to-test label Dec 15, 2025

feat(rlsapi): implement LLM integration for v1 /infer endpoint

564434e

Wire up the /infer endpoint to Llama Stack for actual inference: - Handle empty LLM responses with fallback message - Include request_id in all log statements for tracing Signed-off-by: Major Hayden <major@redhat.com>

major force-pushed the rlsapi-v1 branch from 27a95ab to 564434e Compare December 15, 2025 18:25

coderabbitai bot reviewed Dec 15, 2025

View reviewed changes

fixup! feat(rlsapi): implement LLM integration for v1 /infer endpoint

f0d9dbc

Signed-off-by: Major Hayden <major@redhat.com>

coderabbitai bot reviewed Dec 15, 2025

View reviewed changes

tisnik approved these changes Dec 15, 2025

View reviewed changes

openshift-ci bot added ok-to-test and removed needs-ok-to-test labels Dec 15, 2025

tisnik merged commit 5e66f50 into lightspeed-core:main Dec 16, 2025
19 of 26 checks passed

This was referenced Dec 16, 2025

RSPEED-2229: Add rlsapi v1 endpoint registration and CLA documentation #928

Merged

test(rlsapi): add initial e2e test for v1 infer endpoint #929

Open

coderabbitai bot mentioned this pull request Jan 7, 2026

fix(rlsapi): use Responses API instead of Agent abstraction #969

Merged

4 tasks

coderabbitai bot mentioned this pull request Jan 15, 2026

fix(rlsapi): add error handling, metrics, and system context #1006

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(rlsapi): implement LLM integration for v1 /infer endpoint #916

feat(rlsapi): implement LLM integration for v1 /infer endpoint #916

Uh oh!

major commented Dec 15, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 15, 2025 •

edited

Loading

Uh oh!

openshift-ci bot commented Dec 15, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

tisnik left a comment

Uh oh!

tisnik commented Dec 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(rlsapi): implement LLM integration for v1 /infer endpoint #916

feat(rlsapi): implement LLM integration for v1 /infer endpoint #916

Uh oh!

Conversation

major commented Dec 15, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

openshift-ci bot commented Dec 15, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

tisnik left a comment

Choose a reason for hiding this comment

Uh oh!

tisnik commented Dec 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

major commented Dec 15, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 15, 2025 •

edited

Loading